Identifying MAGs with the most expression,looks like roughly the top 15 MAGs are of interest. Cyanos and proteobacteria, cutoff roughly at 1000tpm for the entire year expression (very little), so the most signifcant contributions come from 6 MAGS essentially. Bin 236 most prominent, which is the aphanizomenon. Also two proteobacteria (pelagibacteraceae) are part of the more prominent mags.

Looking at heterotrophic vs autotrophic expression.

Interesting to see here is the extra peak that the autotrophic has as compared to the heterotrophic community in the phosphonate metabolism. This would be very interesting to see in the whole community data, as there are only a few mags responsible for this expression (hardly representative for a big picture overview). It appears that the autotrophic and heterotrophic expression is occuring in tandem.

Filtering step. Plotting the cyanobacterial MAGs we can see that there is essentially only two cyanobacterial MAGs of interest, bin236 and bin109, which is Aphanizomenon and N. spumigena respectively. The proteobacteria has only two interesting mags from these plots, which are both pelagibacteraceae.

Diving into the heterotrophic fraction of the P-gene expression

Plotting the Aphanizomenon and spumigena, Spumigena shows clear peaks in summer and autumn in its expression, and has a wider range of phosphonates as compared to Aphanizomenon.

Seeing these dynamics, it is also important to see how the MAGs themselves are behaving by looking at their total expression activity. This shows that the Aphanizomenon bin is active more or less all year round, while the Spumigena is active mainly in summer, which explains the pulses that appears on the P-met, it is doing everything at the same time it appears. For the Aphanizomenon it appears that the phosphonate metabolism occurs mostly outside of winter, which is also what we would expect. #Fig 7 and 8 Another question that is interesting is if the expression from the aphani correlates with the abundance of Nostophycaeae(activity and expression correlation). Which does not show a clear relationship (tpm divided by 2000 to match the scale of the biomass.) Here I will also add in the Spumigena.

Investigating the proteobacteria belonging to the family pelagibacteraceae, phosphonate genes seems to have peaks in late spring/early summer. The first appears to show a succesion, with phosphonate expression in spring/summer, followed by phosphatase expression in summer/autumn, and finally Pi uptake in winter, indicating that this MAG is able to metabolise both organic and inorganic P.

  • As is clear, the majority of expression and activity for pelagibacteraceae 1 (bin104) occurs during the spring, indicating that it is mainly thriving on organic P, but the inorganic P seems to have a somewhat of a correlation with the winter expression.
  • The second proteo shows summer and autumn as the dominant periods of activity. Also showing somewhat of a preference for Pi in winter, however no phosphonates found in the expression, instead there is more consistent expression of P sensing and regulation. In common for the mags is that it appears that phnD is the major player of expression in phosphonate metabolism (wait for hmmer to show something different, maybe), and pstS is the key expressed gene in uptake.

#Fig 12, 13

Various exploratory analyses in progress. (End of document here)

This figure shows that there appears to be seasonality indeed for the p-gene expression, and that it is recurring regardless of the different years (very nice). PCA is based on all P-genes, it could be an idea to later split them up by their categories and look at them.

Correlation of genes with other parameters - Unfortunately the correlation of Pi with the genes gives very low correlations, (best is 0.), on the other hand there are some that seems to correlate with nitrogen, possibly have to do with the fact that a MAG may be very productive when N is abundant, and as a result there is an increase in the gene expression of these genes as well, as N might be the delimiting factor, sign of N limitation(?). Here the correlation is done for the Aphanizomenon. In the NMDS, the Julian category refers to the julian day of the year, eg. 1-365.

## Call: rda(formula = aphani.wide.hellinger ~ Chla_Average +
## Nitrate_Average + Phosphate_Average, data = lmo_date_prep, na.action =
## na.exclude)
## 
##               Inertia Proportion Rank
## Total         0.04900    1.00000     
## Constrained   0.02064    0.42131    3
## Unconstrained 0.02836    0.57869    8
## Inertia is variance 
## 2 observations deleted due to missingness 
## 
## Eigenvalues for constrained axes:
##     RDA1     RDA2     RDA3 
## 0.017599 0.002145 0.000900 
## 
## Eigenvalues for unconstrained axes:
##      PC1      PC2      PC3      PC4      PC5      PC6      PC7      PC8 
## 0.010462 0.008564 0.004124 0.002322 0.001457 0.001155 0.000198 0.000074
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = aphani.wide.hellinger ~ Chla_Average + Nitrate_Average + Phosphate_Average, data = lmo_date_prep, na.action = na.exclude)
##          Df  Variance       F Pr(>F)    
## RDA1      1 0.0175991 16.7570  0.001 ***
## RDA2      1 0.0021453  2.0427  0.257    
## RDA3      1 0.0009003  0.8572  0.499    
## Residual 27 0.0283568                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Variable rho pValue
Temperature_C -0.828673575761407 4.72919345312975e-09
cDOM_Average -0.146341467855824 0.44875044738141
Chla_Average -0.348054050230524 0.0550190614327223
DOC_Average -0.645552608505255 0.000492136931040992
Nitrate_Average 0.569202576016967 0.000674312797015418
Phosphate_Average 1 0
ko:K01077 -0.254722175565038 0.159448301155494
ko:K02036 -0.0504170870208821 0.784062115461511
ko:K02039 -0.0690327392052558 0.707348273615104
ko:K02040 0.388084332827027 0.0281759731160455
ko:K02041 -0.258206496181118 0.153622798531812
ko:K02044 -0.404399640158914 0.0216952694245981
ko:K06217 0.476441712764375 0.00583918989148335
ko:K07636 -0.143222543088269 0.434215031999715

Now repeat the same as above, but for the proteobacteria instead, the one referred to as pelagi 1 in earlier figures.

Same thing as above, but for the proteobacteri, nothing correlated well here either, not that It will make much of a difference, but perhaps group the genes here as well and look at the potentially stronger (or worse) correlation.

## Call: rda(formula = pelagi1.wide.hellinger ~ Chla_Average +
## Nitrate_Average + Phosphate_Average, data = env_variables, na.action =
## na.exclude)
## 
##               Inertia Proportion Rank
## Total         0.23713    1.00000     
## Constrained   0.03427    0.14453    3
## Unconstrained 0.20286    0.85547    8
## Inertia is variance 
## 2 observations deleted due to missingness 
## 
## Eigenvalues for constrained axes:
##     RDA1     RDA2     RDA3 
## 0.023329 0.010617 0.000326 
## 
## Eigenvalues for unconstrained axes:
##     PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8 
## 0.11617 0.05189 0.02069 0.00634 0.00297 0.00216 0.00156 0.00109
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = pelagi1.wide.hellinger ~ Chla_Average + Nitrate_Average + Phosphate_Average, data = env_variables, na.action = na.exclude)
##          Df Variance      F Pr(>F)
## RDA1      1 0.023329 2.8750  0.294
## RDA2      1 0.010617 1.3084  0.582
## RDA3      1 0.000326 0.0402  1.000
## Residual 25 0.202859

Variable rho pValue
Temperature_C -0.818424566088117 3.28970185291561e-08
cDOM_Average -0.225309115737528 0.258498515499418
Chla_Average -0.37516935867772 0.0449194002608719
DOC_Average -0.594855469229526 0.00275425824525371
Nitrate_Average 0.538435869390334 0.00214450278151747
Phosphate_Average 1 1.73407283747191e-216
ko:K02036 -0.118011079127091 0.534544194618961
ko:K02038 -0.30120885408807 0.105769212981798
ko:K02039 -0.289583895919957 0.120616324477844
ko:K02040 0.232654087247159 0.21600738789824
ko:K02041 -0.354402674695823 0.0546617010178555
ko:K02044 0.0865204305067192 0.649390086211261
ko:K06217 -0.39158738853315 0.0323571526687303
ko:K19670 -0.428802068268948 0.0180614789864436

#Other ideas, I want to correlate the P-related genes in the mags with other genes in them to see if any correlate tigthly, suggesting potential relationship, another wariant would be a network analysis of the CDS.